Goto

Collaborating Authors

 Bushehr


Lotus at SemEval-2025 Task 11: RoBERTa with Llama-3 Generated Explanations for Multi-Label Emotion Classification

Ranjbar, Niloofar, Baghbani, Hamed

arXiv.org Artificial Intelligence

This paper presents a novel approach for multi-label emotion detection, where Llama-3 is used to generate explanatory content that clarifies ambiguous emotional expressions, thereby enhancing RoBERTa's emotion classification performance. By incorporating explanatory context, our method improves F1-scores, particularly for emotions like fear, joy, and sadness, and outperforms text-only models. The addition of explanatory content helps resolve ambiguity, addresses challenges like overlapping emotional cues, and enhances multi-label classification, marking a significant advancement in emotion detection tasks.


What is Wrong with Perplexity for Long-context Language Modeling?

Fang, Lizhe, Wang, Yifei, Liu, Zhaoyang, Zhang, Chenheng, Jegelka, Stefanie, Gao, Jinyang, Ding, Bolin, Wang, Yisen

arXiv.org Artificial Intelligence

Handling long-context inputs is crucial for large language models (LLMs) in tasks such as extended conversations, document summarization, and many-shot in-context learning. While recent approaches have extended the context windows of LLMs and employed perplexity (PPL) as a standard evaluation metric, PPL has proven unreliable for assessing long-context capabilities. The underlying cause of this limitation has remained unclear. In this work, we provide a comprehensive explanation for this issue. We find that PPL overlooks key tokens, which are essential for long-context understanding, by averaging across all tokens and thereby obscuring the true performance of models in long-context scenarios. To address this, we propose \textbf{LongPPL}, a novel metric that focuses on key tokens by employing a long-short context contrastive method to identify them. Our experiments demonstrate that LongPPL strongly correlates with performance on various long-context benchmarks (e.g., Pearson correlation of -0.96), significantly outperforming traditional PPL in predictive accuracy. Additionally, we introduce \textbf{LongCE} (Long-context Cross-Entropy) loss, a re-weighting strategy for fine-tuning that prioritizes key tokens, leading to consistent improvements across diverse benchmarks. In summary, these contributions offer deeper insights into the limitations of PPL and present effective solutions for accurately evaluating and enhancing the long-context capabilities of LLMs. Code is available at https://github.com/PKU-ML/LongPPL.


Improving the quality of Persian clinical text with a novel spelling correction system

Dashti, Seyed Mohammad Sadegh, Dashti, Seyedeh Fatemeh

arXiv.org Artificial Intelligence

Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed. Conclusions: Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.


An Expert System to Diagnose Spinal Disorders

Dashti, Seyed Mohammad Sadegh, Dashti, Seyedeh Fatemeh

arXiv.org Artificial Intelligence

Objective: Until now, traditional invasive approaches have been the only means being leveraged to diagnose spinal disorders. Traditional manual diagnostics require a high workload, and diagnostic errors are likely to occur due to the prolonged work of physicians. In this research, we develop an expert system based on a hybrid inference algorithm and comprehensive integrated knowledge for assisting the experts in the fast and high-quality diagnosis of spinal disorders. Methods: First, for each spinal anomaly, the accurate and integrated knowledge was acquired from related experts and resources. Second, based on probability distributions and dependencies between symptoms of each anomaly, a unique numerical value known as certainty effect value was assigned to each symptom. Third, a new hybrid inference algorithm was designed to obtain excellent performance, which was an incorporation of the Backward Chaining Inference and Theory of Uncertainty. Results: The proposed expert system was evaluated in two different phases, real-world samples, and medical records evaluation. Evaluations show that in terms of real-world samples analysis, the system achieved excellent accuracy. Application of the system on the sample with anomalies revealed the degree of severity of disorders and the risk of development of abnormalities in unhealthy and healthy patients. In the case of medical records analysis, our expert system proved to have promising performance, which was very close to those of experts. Conclusion: Evaluations suggest that the proposed expert system provides promising performance, helping specialists to validate the accuracy and integrity of their diagnosis. It can also serve as an intelligent educational software for medical students to gain familiarity with spinal disorder diagnosis process, and related symptoms.


Iran's Long Night Is Capped by an Earthquake

NYT > Middle East

It had already been an eventful day in Iran: The country had just launched missiles at United States forces based in Iraq and an airliner carrying at least 176 people crashed shortly after takeoff from Tehran on Wednesday, killing everyone on board. Then just before dawn, a 4.5-magnitude earthquake struck southern Iran at a depth of about six miles, the United States Geological Survey reported, in the same region as the troubled Bushehr nuclear power plant. It struck just as Iranian leaders were trumpeting their strike on two Iraqi bases housing United States forces, in retaliation for last week's American drone strike that killed Maj. No casualties were immediately reported, but rescue teams were working at the site, Jahangir Dehqani, managing director of the Bushehr crisis management agency, told the state-run IRNA news agency. The quake was reported about 30 miles from the Russian-built Bushehr nuclear plant.